Get last 10 lines of very large text file > 10GB c?

Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens in the file located at path encoded in encoding where the token separator is represented by tokenSeparator the result is returned as a string (this could be improved by returning an IEnumerable.

Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens in the file located at path encoded in encoding where the token separator is represented by tokenSeparator; the result is returned as a string (this could be improved by returning an IEnumerable that enumerates the tokens).

Public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) { int sizeOfChar = encoding. GetByteCount("\n"); byte buffer = encoding. GetBytes(tokenSeparator); using (FileStream fs = new FileStream(path, FileMode.

Open)) { Int64 tokenCount = 0; Int64 endPosition = fs. Length / sizeOfChar; for (Int64 position = sizeOfChar; position End); fs. Read(buffer, 0, buffer.

Length); if (encoding. GetString(buffer) == tokenSeparator) { tokenCount++; if (tokenCount == numberOfTokens) { byte returnBuffer = new bytefs. Length - fs.

Position; fs. Read(returnBuffer, 0, returnBuffer. Length); return encoding.

GetString(returnBuffer); } } } // handle case where number of tokens in file is less than numberOfTokens fs. Seek(0, SeekOrigin. Begin); buffer = new bytefs.

Length; fs. Read(buffer, 0, buffer. Length); return encoding.

GetString(buffer); } }.

5 That assumes an encoding where the size of the character is always the same. It could get tricky in other encodings. – Jon Skeet Dec 29 '08 at 20:31 1 And, as Skeet informed me once, the Read method is not guaranteed to read the requested number of bytes.

You have to check the return value to determine if you're done reading... – Will? Dec 29 '08 at 20:52 @Jon: Variable-length character encoding. Oh joy.

– Jason Dec 30 '08 at 2:43 @Will: There are several places where error checking should be added to the code. Thank you, though, for reminding me of one of the nasty facts about Stream.Read. – Jason Dec 30 '08 at 2:45 2 I've noticed this procedure is quite timely when executed on a file ~4MB.

Any suggested improvements? Or other C# examples on tailing files? – GONeale Mar 2 '09 at 5:01.

I'd likely just open it as a binary stream, seek to the end, then back up looking for line breaks. Back up 10 (or 11 depending on that last line) to find your 10 lines, then just read to the end and use Encoding. GetString on what you read to get it into a string format.

Split as desired.

Tail is a unix command that will display the last few lines of a file. There is a Windows version in the Windows 2003 Server resource kit.

S tags indicate he's after a C# solution – ctacke Dec 29 '08 at 19:23 3 I noticed that. I just thought I'd throw it out there anyway. – w4g3n3r Dec 29 '08 at 19:38 tip : see tail version in C# at tail.svn.codeplex.Com/svn – lsalamon Jan 20 '10 at 15:49.

As the others have suggested, you can go to the end of the file and read backwards, effectively. However, it's slightly tricky - particularly because if you have a variable-length encoding (such as UTF-8) you need to be cunning about making sure you get "whole" characters.

You should be able to use FileStream.Seek() to move to the end of the file, then work your way backwards, looking for \n until you have enough lines.

I'm not sure how efficient it will be, but in Windows PowerShell getting the last ten lines of a file is as easy as Get-Content file. Txt | Select-Object -last 10.

This method is quite sluggish already at ~20 MB files. – Jan Wikholm Apr 20 '10 at 8:42.

I think the following code will solve the prblem with subtle changes regrading encoding StreamReader reader = new StreamReader(@"c:\test. Txt",Encoding. ASCII); reader.BaseStream.

Seek(0, SeekOrigin. End); int count = 0; while (count Position--; if (c == '\n') { ++count; } } string str = reader.ReadToEnd(); string arr = str. Split('\n'); reader.Close().

That is what unix tail command does. See en.wikipedia.org/wiki/Tail_(Unix) There is lots of open source implementations on internet and here is one for win32: Tail for WIn32.

You could use the windows version of the tail command and just pype it's output to a text file with the > symbol or view it on the screen depending on what your needs are.

If you open the file with FileMode. Append it will seek to the end of the file for you. Then you could seek back the number of bytes you want and read them.It might not be fast though regardless of what you do since that's a pretty massive file.

One useful method is FileInfo.Length. It gives the size of a file in bytes. What structure is your file?

Are you sure the last 10 lines will be near the end of the file? If you have a file with 12 lines of text and 10GB of 0s, then looking at the end won't really be that fast. Then again, you might have to look through the whole file.

If you are sure that the file contains numerous short strings each on a new line, seek to the end, then check back until you've counted 11 end of lines. Then you can read forward for the next 10 lines.

I think the other posters have all shown that there is no real shortcut. You can either use a tool such as tail (or powershell) or you can write some dumb code that seeks end of file and then looks back for n newlines. There are plenty of implementations of tail out there on the web - take a look at the source code to see how they do it.

Tail is pretty efficient (even on very very large files) and so they must have got it right when they wrote it!

Open the file and start reading lines. After you've read 10 lines open another pointer, starting at the front of the file, so the second pointer lags the first by 10 lines. Keep reading, moving the two pointers in unison, until the first reaches the end of the file.

Then use the second pointer to read the result. It works with any size file including empty and shorter than the tail length. And it's easy to adjust for any length of tail.

The drawback, of course, is that you end up reading the entire file and that may be exactly what you're trying to avoid.

If the file is 10GB, I think its safe to say that's exactly what he's trying to avoid :-) – gbjbaanb Dec 29 '08 at 22:15.

If you have a file that has a even format per line (such as a daq system), you just use streamreader to get the length of the file, then take one of the lines, (readline()). Divide the total length by the length of the string. Now you have a general long number to represent the number of lines in the file.

The key is that you use the readline() prior to getting your data for your array or whatever. This is will ensure that you will start at the beginning of a new line, and not get any leftover data from the previous one. StreamReader leader = new StreamReader(GetReadFile); leader.BaseStream.

Position = 0; StreamReader follower = new StreamReader(GetReadFile); int count = 0; string tmper = null; while (count samples = new List(); follower.ReadLine(); while (!follower. EndOfStream) { led = follower.ReadLine(); lead = Tokenize(led); samples. Add(lead); }.

Well duh, obviously you need a bit of code like this... for( line = 0 to 10000000000; line++ ){ t = file. Readline( checkEachCharForSpecialEncoding = true ) if( line > 9999999990 ) output t; } problem solved. Take a quick holiday and wait for your answer.

Why not use file. Readalllines which returns a string? Then you can get the last 10 lines (or members of the array) which would be a trivial task.

This approach isn't taking into account any encoding issues and I'm not sure on the exact efficiency of this approach (time taken to complete method, etc).

2 the man asking about large file > 10 GB! – Ahmed Said Dec 30 '08 at 8:33.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions